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Abstract 

Tunnelling between different topological sectors with dynamical chiral fermions is 
difficult because of a poor mass scaling of the pseudo-fermion estimate of the deter- 
minant. For small fermion masses it is virtually impossible using standard methods. 
However, by projecting out the small Wilson eigenvectors from the overlap operator, 
and treating the correction determinant exactly, we can significantly increase the 
rate of topological sector tunnelling and reduce substantially the auto-correlation 
time. We present and compare a number of different approaches, and advocate a 
method which allows topological tunnelling even at low mass with little addition to 
the computational cost. 
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1 Introduction 



In recent years the statistical precision of lattice simulations has reached such 
a high level that the complete control of systematic uncertainties became 
an issue of central importance. In this respect the overlap fermion action[l,2] 
would be the preferred choice, were it not for its massive computational needs. 
Unlike other formulations of lattice QCD it has an exact lattice chiral symme- 
try [3] and index theorem such that the combined continuum and chiral limits 
should not hold any unpleasant surprises. Pioneering investigations [4,5,6,7] 
showed, however, that such simulations are faced with quite formidable chal- 
lenges. While the rapid progress of massively parallel computers will alleviate 
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the problems [8] the feasibility of really controlled simulations requires some 
algorithmic break-through. In this paper we shall focus on one of the main 
algorithmic issues, of considerable concern in the literature and present a pos- 
sible solution. 



The overlap operator is defined as 

D = (l + p) + (l-p) l5 e(Q). (1) 

Q is constructed from any doubler-free Dirac operator Dy/ which fulfils 75- 
hermiticity according to Q — 7 5 _D W /(— p). For the Wilson Dirac operator this 
gives 

Qxy = 75 [Sxy - « ((1 - l^U^{x)5y tX+fl + (1 + "f^U^X - p)8 y ^^j\ . (2) 

p and p are mass parameters related to the bare fermion mass by mbare = 2 pp. 
(Note that different definitions of p are used in the literature.) In this work, 
we will always use the Wilson Dirac operator as kernel with p = 1.5; k = 
1/(8 - 2p) = 0.2. 

The matrix sign function is defined as 

e(g)=El^X^|sign(AO, (3) 

i 

where and Aj are the eigenvectors and eigenvalues of Q respectively, and 
the sum is over the complete set of eigenvectors. In practice, given that the 
calculation of the entire eigenvalue spectrum is impractical, one combines an 
approximation to the sign function, such as the Zolotarev rational approx- 
imation [9] for the bulk of the eigenvalue spectrum, with the spectral de- 
composition for the eigenvalues closest to zero, where no approximation can 
(realistically) be accurate enough without immense computational cost. 

In terms of the Hermitian overlap operator H = 75D, and the gauge action 
S g [U] for a gauge field U, the lattice QCD partition function for two degenerate 
fermion flavours is 

Z = J dU det(H 2 [U, A)e- 8 'M = J dUd^e- 8 '^-*^*, (4) 

where we have used pseudo- fermion fields <fi to approximate the determinant. 
The standard HMC algorithm [10] generates a new gauge field by introducing 
a momentum II, updating the momentum and gauge field along the classical 
trajectory using a numerical integration algorithm (molecular dynamics), and 
finishing with a Metropolis step to ensure that the update of the gauge field 
satisfies detailed balance. The numerical integration must be reversible and 
ergodic, and approximately conserve the action minus the logarithm of the 
Jacobian of the integration. 
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The QCD vacuum contains several homotopy classes, each distinguished by a 
Pontryagin index, the winding number or topological charge. In the continuum 
theory, this topological charge is an integer given by 



On the lattice, discretization errors break the barriers between the topological 
sectors, and the topological charge is no longer well defined. However, it is 
possible to define a modified topological index in a number of different ways, 
each leading to exactly or approximately integer values. These topological 
indexes can be defined either using the gauge fields or fermionic operators. 
In all cases some means must be found to filter out ultra violet fluctuations 
which can lead to a different assignment of topological charge for the same 
configuration. As the lattice spacing shrinks, these artefacts disappear, and 
all the topological indexes reduce to the topological charge in the continuum. 
Here, we shall use solely the overlap definition of the topological index, 



Solutions for gauge field vacua with various topological charges are well known, 
for example instantons [11,12], calorons [13], vortexes [14], membranes [15,16] 
etc. For the sake of simplicity, we shall assume for the following discussion 
that the topological properties of the vacuum are described by instantons and 
anti-instantons, although the physical reality seems to be far more complex. 
The specific model adopted is irrelevant for the arguments given below. 

To ensure ergodicity, a lattice simulation must sample all topological sectors 
in an unbiased manner. In fact, it must also sample all parts of a topological 
sector (which might or might not be connected) in an unbiased manner. Oth- 
erwise one introduces hardly controllable systematic uncertainties. To stress 
this fact let us remind that many aspects of low energy physics are intimately 
connected to topologically non-trivial field configurations. Examples are, e.g., 
the density of low lying Dirac eigenvalues, and thus the chiral condensate, 
and through the latter the pion mass. To sample all topological configura- 
tions in an unbiased manner, the Monte-Carlo method used must allow for 
the creation and destruction of instanton/anti-instanton pairs. Having instan- 
tons fade in and out of the resolution of the topological charge operator is not 
enough. As the production and annihilation of instantons and anti-instanton 
is usually associated with eigenvalues wandering through the Ginsparg- Wilson 
circle towards the origin, or with a crossing from positive to negative eigen- 
values, using methods to suppress very small kernel eigenvalues [17,18] results 
in a number of systematic uncertainties. Therefore, our strategy is to treat 
overlap fermions exactly, despite the large numerical cost. Furthermore, if, in 
a HMC simulation with overlap fermions, kernel eigenvectors with near-zero 




(5) 



Qf = -2 si s n< 5- 



(6) 
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eigenvalues are artificially suppressed, it is very hard to measure the rate of 
creation and annihilation of topological objects and thus the auto-correlation 
time. We aim to perform simulations with frequent topological index changes. 

A topological index change, using the overlap definition, occurs, e.g., when 
one of the eigenvalues of the kernel operator Q changes sign, as is clear from 
the definition in equation (6). Not surprisingly, this leads to a discontinuity in 
the fermion operator, and thus a S— function in the fermionic force leading to 
a change of the kinetic energy by — AS. 

The standard method to incorporate this Dirac 5— function force into the HMC 
is the transmission/reflection algorithm first published in [4] and improved 
by our own work in [5] and [19] (we improved the energy conservation, and 
increased the transmission rate). For simplicity we shall here discuss a spe- 
cific two dimensional algorithm taken from [5], although our final result, in a 
sufficiently large volume, is unaffected by the choice of algorithm (not with- 
standing the improvements of [19]). In this particular model, the momentum 
is changed parallel to two ortho normal basis vectors rji and 772- The energy 
conservation equation reads 

\n\ = \nl - AS, (7) 

and we satisfy it by using the update (with (A, B) denoting the inner product 
of the fields A and B) 



n + = n_ + ( mi vu n_) + n_)) (^1 - ^f^-^ - 1) . 

(8) 

Proof that this algorithm is area conserving and reversible with a suitable 
choice of rji and r/2 can be found in [5]. When the term in the square root is 
positive, we can continue, and the topological charge is changed. When it is 
negative, we have to reflect off the potential wall, and there is no topological 
charge change. Assuming that the momentum is given by a Gaussian distribu- 
tion, the transmission rate, the number of transmission steps divided by the 
total number of transmission and reflection steps is thus for a given AS 

_ jd( m , u)d(rj 2 , uMrn, n) 2 + ( m , n) 2 - 2 AS)e-^ 2 ' 2 -^ 2 / 2 

J d( Vl , U)d(r]2, n)e-^> n ) 2 / 2 -te,nP/2 
=min(l,e- A5 ). (9) 

For a sufficiently large volume, the transmission rate is given by this formula 
for all possible energy and area conserving transmission algorithms. While it 
is possible to improve the transmission rate using non-area conserving algo- 
rithms [19], the dependence on the mass remains the same. To have a large 
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transmission rate, and thus small auto-correlation, it is necessary to reduce 
the action jump, AS. 

In section 2, we demonstrate that AS scales as 0(/i~ 2 ) when the fermion 
determinant is estimated by a single pseudo-fermion. Therefore at low masses, 
topological charge changes are practically impossible. While using multiple 
pseudo-fermions reduces this problem, it does not eliminate it. 

However, it was noted in [6,20] that the change in the actual fermion deter- 
minant, as opposed to the pseudo-fermion estimate of it, scales much better 
with the quark mass, and in section 2 we will argue that the dominant terms 
in the mean discontinuity are 0(1). Therefore, if we were to use the actual 
determinant, rather than the pseudo-fermion estimate, the bad scaling of the 
topological auto-correlation time with the mass would not occur. Of course, 
in practice, this is impossible. 

However, there is another possibility, which is to factorise the determinant into 
two terms: a larger continuous determinant that can be treated with pseudo- 
fermions; and a smaller determinant that contains all the discontinuities and 
which can be treated exactly. In principle, this should combine the low AS of 
the exact determinant, with the better volume scaling of the pseudo-fermion 
estimate. Over the past year and a half, we have explored many different 
factorisation schemes, some inexact (in the sense that the pseudo-fermion 
term was still discontinuous, though the action jump was reduced) and quick; 
others exact but slower. Here, we will concentrate on one of these algorithms, 
which we call algorithm C. We presented preliminary results for algorithm C 
and a second approach, algorithm F, in [21]. We note that Stefan Schaefer 
independently proposed a similar method in [22], though later abandoned 
it [23], partly in view of the difficulties which we shall discuss in section 5. 
We have tested our implementation of this method on 8 3 16 lattices and 12 3 48 
lattices (although here we only present results from the 8 3 16 lattices), and, 
using various tricks, have not encountered any difficulties. There might be 
problems as we move to larger lattice volumes, but we are currently working 
to avoid them. 

One obvious criticism of our approach is that we are focusing on increasing 
the transmission rate, while the auto-correlation time depends on the rate of 
topological charge change, which is the transmission rate multiplied by the 
number of attempted topological charge changes. However, we believe that 
the transmission rate is the key quantity when determining the efficiency of 
the algorithm. It takes time to perform the transmission/reflection step: we 
need to find the gauge field where the kernel operator has a zero eigenvalue; 
the kernel eigenvalues need to be recalculated; and several overlap inversions 
are needed to calculate AS* and the fermionic forces, both for the transmis- 
sion and reflection steps. Each transmission step costs a certain amount of 
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time, but results in a change in the topological charge, and thus the cost is 
justified because it reduces the auto-correlation. When there is no transmis- 
sion/reflection step, there is no topological charge change, but there is also 
no additional cost. For a reflection step, however, we both spend the com- 
puter time and get no tunnelling event. The number of transmission steps 
increases with the volume, because of the higher density of small eigenvalues, 
so we expect the transmission/refection part of the algorithm to dominate. 
The auto-correlation time should decrease as the number of successful topo- 
logical index changes increases. An efficient algorithm will therefore have as 
few reflections as possible, and therefore a maximised transmission rate. 

In this paper we demonstrate that by using these methods we can improve the 
tunnelling of the overlap Hybrid Monte Carlo by a significant factor, and the 
cost of the factorisation is far less than the gain from reduced auto-correlation. 
This gain will increase at smaller mass, while the cost does not. Using this 
method, even at small mass, the transmission/reflection part of the algorithm 
will therefore allow for frequent topological tunnelling. 

This article is organised as follows: in section 2, we present a model that 
approximates the scaling of the action jump with the quark mass for both the 
actual determinant and the pseudo-fermion estimate. Section 3 describes our 
proposed factorisation, and the (standard) method which we will compare it 
against in our tests. Section 4 describes our numerical results, and, after a 
few comments on energy conservation and the fermionic force in section 5, we 
conclude in section 6. 



2 The scaling of the action jump with the quark mass. 



In subsequent sections, we will present a method which vastly improves the 
scaling of the action jump with the quark mass. To motivate our methods and 
interpret our results we have developed a simple model to estimate the scaling 
of the action jump when estimated by pseudo-fermions and for the actual 
fermion determinant. We stress that while this somewhat simple model shows 
properties which we observe numerically, our algorithm does not depend on 
this model in any way. 

The discontinuity in the action for the pseudo-fermion discontinuity is [5] 

= (1 " /i 2 ) (01 ^(75 m M + m M 7 5 )^r 10) 2e(A_), (10) 
where ip is the Kernel eigenvector whose eigenvalue is zero, and the — subscript 
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indicates the overlap operator just before the crossing and the + subscript just 
after the crossing, and 

{</>) = H \ V ), (11) 

where H is the overlap operator at the start of the trajectory. If the eigen- 
vectors of are \9), with eigenvalues 1 + \i + (1 — fi)e td , then it is easy to 
show that H \9) = (1 + ^+ (1 -/i)e iS )7 5 \6), and that H 2 \0) = (2 + 2/i 2 +(l- 
li 2 )(e %e + e~ l9 )) \9). Inserting the complete set of the overlap eigenvectors into 
equation (10) gives 

AS =2e(A_) <r/|££££ 7 5 \9 ) (6 \e + ) (9 + \ l5 |V>> ty\9-) (9-\9' Q ) (9' \ l5 \ v ) 

1 + fj, + (1 - fx)e ie ° 1 + // + (1 -fi)e- ie 'o 

2 + 2/i 2 + (1 - /i 2 )(e^+ + e~ ie +) 2 + 2/i 2 + (1 - p 2 )(e ie - + e~ id -) + '°" 

(12) 

To proceed further, in our efforts to calculate (AS") , we need to make a number 
of simplifying assumptions. Firstly, we need to simplify the matrix elements 
obtained from the inner products of the eigenvectors. We assume, based on 
the normalisation of the vectors, that we can treat terms such as (t]\9) as 
inversely proportional to the square root of the volume, and that the other 
matrix elements are inversely proportional to the volume. We also assume that 
we can average the scalar products so that they are analytic functions of the 
eigenvalues, so = 4(9), ((ip\^\9)) = ^ 5 (0), ((^ 2 » = x(0x)x%), 

and ((77I75I6 1 )) = %(^)- This incorporates the assumption that all the ma- 
trix elements are statistically independent. In particular, this means that the 
molecular dynamics has progressed sufficiently far that there is no correlation 
between \9 ) and \9±). While this might seem improbable, we stress that we 
make this assumption only to simplify the presentation. The argument be- 
low, at a qualitative level, only requires that the average value of the product 
of the matrix elements is an analytic function of the eigenvalues. Secondly, 
we assume that we can neglect the zero modes of the overlap operator, since 
the density of zero modes scales as W, where V is the lattice volume, while 
elements of the matrix \x~) (x-\ scale inversely with the volume. We also 
assume that the density of overlap eigenvalues, p(9), is a real and analytic 
function in the complex plane, that it has a well defined non-infinite value on 
the circle of infinite radius and that it is also an analytic function of the quark 
mass PI The assumption of analyticity with respect to the quark mass should 



1 Attempts to construct the eigenvalue density from the fermion weight, Yli Y2i (2 + 
2fi 2 + 2(1 — fi 2 ) cos#j), and the gauge action, which, following [24] can be written 
as Si( 2 + 2 ^ +2(1— )cos9i) £ a -j b ecause Q f problems constructing the Jacobian 
\dU /d0i\. Assuming that this Jacobian can be treated as a constant and that there is 
no correlation between the eigenvalues, following [25], fails in the p-regime because 
it leads to a vanishing density of zero eigenvalues at zero mass, which would lead 



7 



hold in large enough lattice volumes, but probably not, for example, in the 
transition between the p- and e-regimes (if this is a phase-transition rather 
than a crossover), which are known to have different eigenvalue densities. So, 
we could expect a significantly different action jump between these different 
regimes, although the action jump within the various regimes of QCD should 
vary smoothly with the mass. 

Clearly these are severe assumptions (without which is is difficult to see how to 
proceed), so results from this model should only be seen as tentative, and need 
to be confirmed numerically. Using these simplifications, we can now evaluate 
each term in equation (12) separately. We achieve this by substituting z = e ld 
and integrating along the unit circle in the complex plane. So, 



r ' d9p(9)r ]5 (9)x(0) \(l + p) + e id (l-p) 
Jo 1 

I -p(6) V5 (9) X (9) [l+p + z(l-p)] = 



2ir(l + ii)r) 5 (-ioo)x(-ioo)p(-ioo), (13) 



and 



2tt 1 

1 + p z + 2(1 — p z ) cosB 

1-p 2 



= <t> dz- 



Aip 



1 - p 2 )z + 1 + p 2 - 2/i 

1 



(1 - p 2 )z + 1 + p 2 + 2p 



p(9) X (o)m 



-p{-l log( -2-))x(-« l0g( 7—2-)) 



Ap 



1-P 2 



1-P 2 

log(- 



i -p 2 



(14) 



Inserting the results of equations (13) and (14) into (12) gives the dominant 
mass contribution to AS as 0(p~ 2 ). 

For the actual determinant, as opposed to the pseudo-fermion estimate, the 
action jump is [6,20] 



AS = 21ogll + 2e(A_)(V>| — 



(15) 



to a zero chiral condensate through the Banks-Casher relation [26]. 
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Using the same assumptions as before, we write 



W A W = £ M 7 5 \e-) (e-M t + „ + ( 1 _ „ )e ,_ . (w) 

Replacing the sum in equation (16) with an integral, and using the same 
contour as before, we note that the pole at 1 + fi + (1 — fi)z = lies outside 
the contour and obtain 

(<V>i ^- m) oc j^pHoo) (17) 

Hence the discontinuity in the actual determinant when there is a topological 
charge change (and if our numerous assumptions are valid) does not diverge 
in the quark mass. By inspection of equation (15), it can be shown that it is 
also independent of the lattice volume. 

We outlined earlier that the probability of transmission when there is an at- 
tempted topological charge change is min(l, e~ AS ), and numerical evidence 
indicates that, with a pseudo-fermion estimate of the determinant, AS is al- 
most invariably positive!!] The auto-correlation time for topological quantities, 
most obviously the susceptibility, clearly depends strongly on this transmission 
rate. For the sake of being definite, we will assume that the auto-correlation 
time is inversely proportional to the rate of topological charge changes, which 
itself is proportional to the transmission rate. Although this argument is some- 
what over-simplified, numerical studies suggest that it is not unreasonable. 
Thus dynamical overlap simulations where the determinant is simulated with 
pseudo-fermions have an auto-correlation time that scales as e~ a ^ . Because 
all lattice Dirac operators have the same continuum limit, we expect similar 
behaviour for other formulations of lattice QCD, although, because these do 
not have a well defined topological index, the problem may not be as apparent 
as with overlap fermions. 

One possible way of reducing the problem is to use multiple pseudo-fermions [6,27]. 
For example, simulating the fermion determinant using 



teH* = e™*U&M + ™%$M (18) 



2 This is suggested from considerations concerning reversibility. The number of 
transmission steps with positive AS should be equal to the number of transmission 
steps with negative AS, because for a reversed trajectory AS — > — AS, and there 
should be no change in the number of topological charge changes. The number of 
topological charge changes is the number of attempted topological charge changes 
times the probability of transmission. The probability of transmission when AS* is 
negative is one. Therefore the ratio of the number of attempted topological charge 
changes for positive AS to the number for negative AS* is inversely proportional to 
the probability of a topological charge change when AS* is positive. 
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gives 



AS = a^ 2 + (3^ + 7 (/i ' f f + 0( A (19) 

where a and j3 are constants which need to be determined. In principle, by 
adding more pseudo-fermions, one can reduce the action discontinuity con- 
siderably. However, as the mass decreases, more pseudo-fermions need to be 
added, increasing the size of the fermionic force (once the optimum number 
of pseudo-fermions has been passed), the action jump, and the time required 
for each trajectory. Therefore this method, while useful, cannot produce a AS 
that is independent of the quark mass. 

An alternative approach is to factorise the discontinuity from the pseudo- 
fermion term, and treat the discontinuous part of the action exactly. An ob- 
vious way of doing this (although we do not use this method) is to use a 
projector P constructed from the smallest n kernel eigenvectors. We can write 



det[H] =det 
det 



1 - p P5P PH t 1 - p > 



det - ,.,„, 

PHP + (1 - P)H{1 - P) - (1 - P)HP—^—PH{\ - P) 



1 - (1 - P)HP- 
1 

PHP J 



(20) 



If the projectors are exact, so that P 2 = P, then the first two determinants 
on the RHS of equation (20) are 1 and the third determinant can be further 
factorised into the product of 

det[PHP + (1 - P)] (21) 

and 

det[(l - P)H(1 - P) - (1 - P^Pj^PHil - P) + P). (22) 

This is, of course, the familiar Schur decomposition. The projectors P could in 
principle be constructed using P = 1 — YJ}=o l^i) The first determinant, 
(21), is then continuous when a kernel eigenvalue crosses zero, and can be 
treated with pseudo-fermions. The second determinant, given in equation (22), 
is discontinuous, but the matrix is small for the determinant to be calculated 
exactly, and explicitly included in the action without pseudo-fermions. Of 
course, the determinant given in equation (21) is discontinuous when there 
is a level crossing between the nth and (n + l)th eigenvalues. To avoid this, 
without resorting to another transmission/reflection stepJjO it is necessary to 



3 Ergodicity could here be preserved by constructing the projector from a different 
number of eigenvalues on different trajectories; as long as care is taken so that this 
does not violate reversibility. 
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use projectors which are a function of the eigenvalue, which means that it 
is impossible to satisfy P 2 = P. While this decomposition is still possible, 
we cannot neglect the first two determinants in equation (20), and the third 
determinant cannot be factorised so easily. For this reason, we prefer to use the 
simpler means to achieve the same end described in the next section (algorithm 
C). 



3 Determinant factorisation 



3.1 Algorithm A 



Algorithm A was introduced in the context of dynamical overlap simulations 
by DeGrand and Schaefer in [6]. It reduces the noise of the pseudo-fermion 
approximation by introducing n additional pseudo-fermions, following the 
method of Hasenbusch [27]: 

det(lfM) = det |M det . . . det(ff «) (23) 

S pf ^{H^H^H^)^ + 0^(/i 2 ) J ff( /Ul )- 2 /J(/i 2 )0 2 + . . . + 
4 +1 tf(/x„)- 2 0n + i (24) 

This decomposition was originally introduced to precondition the fermionic 
force, and as such a few additional pseudo-fermions are needed for an efficient 
algorithm at low mass. As discussed in the previous section, it can also be 
used to effectively reduce the action discontinuity. We use algorithm A with 
one additional pseudo-fermion, as a benchmark against which we will compare 
the results of our new method. 



3.2 Algorithm C 



We factorise the overlap determinant using an eigenvalue projector which de- 
pends on a continuous function a of the kernel eigenvalue. 



H c =(1 + ^) 7 5 + (1 - /i)e(Q) 1 - £ a(Ai) \A) {A 



detH =det[H c ] det 



l + (l- fl )(e(Q)^a(X i )\^)(^\) w - 
i ti c\ 



(25) 
(26) 



where a(0) = 1 and a(X > A) = for a carefully tuned eigenvalue cutoff A. 
The first of the determinants in equation (26) is a continuous function, so it 
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will not contribute to the action jump. The second can be written as 



D c 



det 



l + (l-/i) (e(Ai)a(Ai)) -~- 



det 



+ (1 - /i) (e(Ai)a(Ai)) -~- |^) 



(27) 



This is the determinant of an n x n matrix, which can be calculated easily 
using, for example, LU decomposition [28]. It has been proposed to account 
for this determinant by re-weighting, or by including the correct expression in 
the Hybrid Monte-Carlo Metropolis step, but this leads to a low acceptance 
rate and hence an algorithm that is only a small improvement over algorithm 
A [22]. We propose adding the logarithm of this determinant to the fermion 
action. This obviously costs more per trajectory, both because of the addi- 
tional term in the fermionic force, and because, since Dc is discontinuous at 
the eigenvalue crossing, we would need to track the eigenvalues of the ker- 
nel operator and use the reflection/transmission routine to account for the 
discontinuity in the action0 However, including the additional term in the 
action gains by maintaining a high Monte-Carlo acceptance rate, and we be- 
lieve that this gain outweighs the losses, at least on moderate size lattices. In 
our numerical tests, we used a(A) = e(A)Z(A), where Z is the same Zolotarev 
rational approximation we use to approximate the sign function [9] . With this 
choice, there is no need to project out the eigenvalues when calculating Hq- 
In principle, the choice of a is arbitrary, although we should seek to maximise 
the HMC acceptance rate and avoid exceptional configurations. 

The fermion action is thus (with no additional Hasenbusch pseudo-fermion 



The determinant can be differentiated easily, by differentiating each compo- 
nent of the matrix, and multiplying it by the co- factor of that component. As 



4 It might be possible to remove some of this additional cost because, as outlined 
in section 2 and demonstrated in section 4, we expect this factorisation to reduce 
AS, and if it reduces it enough (to, for example, ~ 0.5), then it might be possible 
to neglect the transmission/reflection step, accepting that the algorithm will not 
conserve energy, using the final HMC metropolis step to ensure detailed balance. 
This would lead to a lower HMC acceptance rate, but the time per trajectory would 
be faster: whether neglecting transmission/reflection would be efficient will depend 
on the size and spread of AS and the frequency of attempted topological charge 
changes. 



terms) : 




c 




12 



an illustration, for a 3 x 3 matrix, we write 



d_ 



log(det[a]) 



1 da 



ii 



det [a] dr 



det 



1 

a 22 a 23 
a 32 a 33 



+ 



1 da 



12 



det [a] dr 



det 



1 

a 2 i a 23 
a 3 i a 33 



+ . . . . 



(29) 



We differentiate the eigenvectors and eigenvalues following established meth- 
ods [5,29]. 



4 Numerical results 



We tested algorithms A and C on a 8 3 16 lattice at masses /i = 0.03, /i = 0.04 
and n = 0.05, corresponding to pion masses of 450 — 550 MeV (calculated from 
the pseudo-scalar correlator) at lattice spacings of about a = 0.11 fm (calcu- 
lated using r = 0.49 [30]). These masses were as low as we could achieve on 
these lattice volumes while remaining in the topologically active p-regime. We 
used two steps of stout smearing with an otherwise standard Wilson kernel 
at smearing parameter p = 0.1, and for these simulations we used no Sexton- 
Weingarten integration [31] (which should not affect the size of the action 
jump). We used the Omelyan integrator [32]. We used one additional flavour 
of Hasenbusch fermions for both algorithms A and C, with the the Hasen- 
busch mass, fi\ the same for algorithms A and C, but varying according to 
quark mass in an (admittedly very crude) attempt to optimise AS for algo- 
rithm A. In fairness, we want to stress that our choice of [i\ turned out to 
be sub-optimal at /i = 0.04. We generated between 120 and 500 trajectories 
for each of our ensembles, and all our ensembles [p, = 0.04, algorithm C was 
the smallest in each case) had over 170 attempted topological index changes. 
We ran trajectories of length 0.5 with 20 integration steps, which achieved an 
Monte Carlo acceptance rate of over 90% in all cases. 

The mean values of the action jump at the topological sector boundary, AS, 
standard deviation of AS, and the transmission rates are given in table 1, 
together with the Hasenbusch mass fj,i. The distribution of AS is shown in 
figure 1. 

It can be seen that algorithm C has a far lower action jump than algorithm 
A, and a much smaller spread of AS. This corresponds to a much larger 
transmission rate. We note that the pion masses used are still large, and 
we expect a far larger gain at more realistic masses. As expected, the mass 
dependence of the action jump for algorithm C is not significant, while for 
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Table 1 

Average values of AS (left) standard deviations of AS (middle) and transmission 
rates (right) for algorithms A and C. 
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Fig. 1. The distribution of AS for masses [i = 0.03, fi 
algorithms A and C. 



0.04, and fi = 0.05 for 



algorithm A it has a strong dependence on the quark mass. 

The increase in the transmission rate corresponds to a reduction in the topo- 
logical auto-correlation time of about a factor of 8. The extra cost of the 
additional inversions meant that it took twice as long to generate the trajec- 
tory (measured on the /x = 0.04 ensemble), meaning that the overall gain of 
this method was roughly a factor of 4. This was without using any Sexton- 
Weingarten methods to place the small determinant on a reduced time-scale, 
which we believe will be effective, and at a relatively large quark mass. We 
expect larger gains for simulations at larger lattice volume and smaller mass. 
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5 Volume scaling and energy conservation 

Another attempt along similar lines has run into difficulties with energy con- 
servation [23]. Part of the problem encountered in these other studies could 
have been avoided by using the improved eigenvector differentiation of [29]. 
Still, part of the problem could be caused by other issues which affect the 
size and stability of the fermionic force. It is therefore appropriate to make a 
few remarks about energy conservation, and in particular the performance on 
larger volumes. 

The obvious disadvantage of our method is the need to perform numerous 
additional inversions of the overlap operator in the calculation of Dq- The 
number of extra inversions will depend on the density of kernel eigenvalues, 
and the value chosen for the cut-off A. In this section we will assume that 
to maintain a fixed number of small eigenvalues below the cut-off, we would 
have to reduce A inversely proportional to the lattice volume, which seems 
to us to be the worst case scenario, as well as being suggested by the Banks- 
Casher relation [26] (we do not yet have any data to study this question). An 
option would be to increase the number of additional inversions as the volume 
increases. In practice, we have combined the two approaches so that we reduce 
A but not to such an extent that the number of inversions in the small matrix 
remains constant. In this section, we will investigate the effect of reducing A 
on the fermionic force and energy conservation. 

For the leapfrog QPQ integration scheme (where the gauge field is updated 
by a half step, the momentum by a full step, and the gauge field by another 
half step), there is a shadow Hamiltonian, H', which is exactly conserved by 
the molecular dynamics for a trajectory of fixed length [33] 



where S is the action, p represents the molecular dynamics momentum, and 
the prime indicates its derivative with respect to the gauge field. From this 
expression, it is easy to read off the size of the energy conservation violation. 
In our case it is clear that da/d\ is proportional to 1/A, d 2 a/d\ 2 is propor- 
tional to 1/A 2 and so on, and if a is continuous, we could continue this series 
indefinitely (bearing in mind that a and all its derivatives are zero, or close 
to zero, for A > A). Thus S' contains terms of order 1/A, S" terms of order 
1/A 2 and 1/A, and so on. It is clear that there is a problem with the volume 
scaling, particularly from the higher order derivatives. We use the Omelyan 



H' —H + — (-p 2 S" + 2S' 2 ) 5t 2 + 




5760 



(30) 
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integrator, [32,34], 



U' = e aTU ^e^ TS '^e^ 2a)TU ^e~^ s '^e aTU ^U, (31) 
which conserves the shadow Hamiltonian 

H' = H + St* ( ^\f + W - + 0(6r% (32) 

By tuning a to \ — -j= we are able to remove the troublesome S" terms from 
the leading order correction to the shadow Hamiltonian. Obviously, this is not 
a complete solution to the problem, and we are in the process of developing a 
method which will allow the factorisation used here to be extended to larger 
volumes [35]. However, this method is more than sufficient for our current runs 
on 8 3 16 and 12 3 48 lattices, where we achieve close to 100% acceptance in both 
cases. 

Obviously, another means to reduce this problem would be to reduce the 
density of small kernel eigenvalues, either through additional smearing or the 
tunnelling HMC algorithm of [25] . We also note that deflation methods can be 
usefully employed when there are multiple right hand sides to solve. By using 
the eigenvectors from the previous molecular dynamics step as a initial guess, 
we are able to calculate the eigenvectors of H c to a suitable precision with 
only a few calls to a low accuracy sign function on our 8 3 16 lattices. Unlike 
deflation methods with the overlap operator itself, there is no large shift in 
the eigenvectors of He when there is a topological charge change, meaning 
that the calculation of the overlap eigenvalues consumes only a small fraction 
of the total computer time per trajectory. 



6 Conclusion 



The topological auto-correlation time limits dynamical lattice QCD simula- 
tions, particularly those involving overlap fermions. We have demonstrated 
that by factorising the fermion determinant into a large continuous matrix 
approximated by pseudo-fermions and a smaller discontinuous part treated 
exactly, we are able to observe topological tunnelling at all masses. With our 
new method, the discontinuity in the pseudo-fermion action is independent 
of the quark mass, and we expect it to be independent of the lattice volume. 
There may be problems affecting the energy conservation and HMC accep- 
tance rate on larger volumes, but we expect that these can be resolved. 
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